Bringing Elastic MapReduce to Scientific Clouds

نویسندگان

  • Pierre Riteau
  • Kate Keahey
  • Christine Morin
چکیده

The MapReduce programming model, proposed by Google, offers a simple and efficient way to perform distributed computation over large data sets. The Apache Hadoop framework is a free and open-source implementation of MapReduce. To simplify the usage of Hadoop, Amazon Web Services provides Elastic MapReduce, a web service that enables users to submit MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud. However, Elastic MapReduce is limited to using Amazon EC2 resources, and requires an extra fee. In this paper, we present our work towards creating an implementation of Elastic MapReduce which is able to use resources from other clouds than Amazon EC2, such as scientific clouds. This work will also serve as a foundation for more advanced experiments, such as performing MapReduce computations over multiple distributed clouds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resilin: Elastic MapReduce for Private and Community Clouds

The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google’s implementation is proprietary, MapReduce can be leveraged by anyone using the free and open source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service ...

متن کامل

Having a ChuQL at XML on the Cloud

MapReduce/Hadoop has gained acceptance as a framework to process, transform, integrate, and analyze massive amounts of Web data on the Cloud. The MapReduce model (simple, fault tolerant, data parallelism on elastic clouds of commodity servers) is also attractive for processing enterprise and scientific data. Despite XML ubiquity, there is yet little support for XML processing on top of MapReduc...

متن کامل

Iterative MapReduce for Large Scale Machine Learning

Large datasets (“Big Data”) are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. The data growth has been accompanied by rapid adoption of large, elastic, multi-tenanted computing clusters (“compute clouds”), leading to a virtuous cycle: the scalability of cloud computing make...

متن کامل

Data Mining Using Clouds: An Experimental Implementation of Apriori over MapReduce

Cloud computing has become a viable mainstream solution for data processing, storage and distribution. It promises on demand, scalable, pay-as-you-go compute and storage capacity. To analyze “big data” on clouds, it is very important to research data mining strategies based on cloud computing paradigm from both theoretical and practical views. For this purpose, we study a strategy of data minin...

متن کامل

High Performance Parallel Computing with Clouds and Cloud Technologies

Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011